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Abstract 



in 



The problem of side-information scalable (Sl-scalable) source coding is considered in this work, where the 
encoder constructs a progressive description, such that the receiver with high quality side information will be 



o . 

, able to truncate the bitstream and reconstruct in the rate distortion sense, while the receiver with low quality 

side information will have to receive further data in order to decode. We provide inner and outer bounds for 
general discrete memoryless sources. The achievable region is shown to be tight for the case that either of the 
decoders requires a lossless reconstruction, as well as the case with degraded deterministic distortion measures. 
CO , Furthermore we show that the gap between the achievable region and the outer bounds can be bounded by a 

constant when square error distortion measure is used. The notion of perfectly scalable coding is introduced as 
both the stages operate on the Wyner-Ziv bound, and necessary and sufficient conditions are given for sources 



. satisfying a mild support condition. Using Sl-scalable coding and successive refinement Wyner-Ziv coding as 

basic building blocks, a complete characterization is provided for the important quadratic Gaussian source with 
multiple jointly Gaussian side-informations, where the side information quality does not have to be monotonic 
along the scalable coding order. Partial result is provided for the doubly symmetric binary source with Hamming 



> 

0^ . distortion when the worse side information is a constant, for which one of the outer bound is strictly tighter than 



the other one. 



o . 

. I. Introduction 

O 

Consider the following scenario where a server is to broadcast multimedia data to multiple users with different 
• i-^ ■ 

^ ■ side informations, however the side informations are not available at the server. A user may have such strong side 
;_i ' 

■ information that only minimal additional information is required from the server to satisfy a fidelity criterion, or 
a user may have barely any side information and expect the server to provide virtually everything to satisfy a 
(possibly different) fidelity criterion. 

A naive strategy is to form a single description and broadcast it to all the users, who can decode only after 
receiving it completely regardless of the quality of their individual side informations. However, for the users 
with good-quality side information (who will simply be referred to as the good users), most of the information 
received is redundant, which introduces a delay caused simply by the existence of users with poor-quality side 
informations (referred to as the bad users) in the network. It is natural to ask whether an opportunistic method 
exists, i.e., whether it is possible to construct a two-layer description, such that the good users can decode with 
only the first layer, and the bad users receive both the first and the second layer to reconstruct. Moreover, it is 
of importance to investigate whether such a coding order introduces any performance loss. We call this coding 
strategy side-information scalable (Sl-scalable) source coding, since the scalable coding direction is from the 
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Side-information Scalable Source Coding: 
A network system-wise perspective 
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Fig. 1. The SR-WZ system vs. the Sl-scalable system. 
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good users to the bad users. In this work, we consider mostly two-layer systems, except the quadratic Gaussian 
source for which the solution to the general multi-layer problem is given. 

This work is related to the successive refinement problem, where a source is to be encoded in a scalable manner 
to satisfy different distortion requirement at each individual stage. This problem was studied by Koshelev [1], and 
by Equitz and Cover [2]; a complete characterization of the rate-distortion region can be found in [3]. Another 
related problem is the rate-distortion for source coding with side information at the decoder [4], for which Wyner 
and Ziv provided conclusive result (now widely known as the Wyner-Ziv problem). Steinberg and Merhav [5] 
recently extended the successive refinement problem in the Wyner-Ziv setting (SR-WZ), when the second stage 
side information Y2 is better than that of the first stage Yi, in the sense that X ^Y2 ^Yi forms a Markov string. 
The extension to multistage systems with degraded side informations in such a direction was recently completed 
in [6]. Also relevant is the work by Heegard and Berger [7] (see also [8]), where the problem of source coding 
when side information may be present at the decoder was considered; the result was extended to the multistage 
case when the side informations are degraded. This is quite similar to the problem being considered here and in 
[5] [6], however without the scalable coding requirement. 

Both the SR-WZ [5] [6] and Sl-scalable problems can be thought as special cases of the problem of scalable 
source coding with no specific structure imposed on the decoder SI; this general problem appears to be quite 
difficult, since even without the scalable requirement, a complete solution to the problem has not been found [7]. 
Here we emphasize that the SR-WZ and the Sl-scalable problem are quite different in terms of their applications, 
though they seem similar since only the order of SI quality that is reversed. Roughly speaking, in the Sl-scalable 
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problem, the side information I2 at the later stage is worse than the side information Yi at the early stage, while 
in the SR-WZ problem, the order is reversed. In more mathematically precise terms, for the Sl-scalable problem, 
the side informations are degraded as X ^ Yi ^ Y2, in contrast to the SR-WZ problem where the reversed 
order is specified as X ^ Y2 ^ Yi. The two problems are also different in terms of their possible applications. 
The SR-WZ problem is more applicable for a single server-user pair, when the user is receiving side information 
through another channel, and at the same time receiving the description(s) from the server; for this scenario, two 
decoders can be extracted to provide a simplified model. On the other hand, the Sl-scalable problem is more 
applicable when multiple users exist in the network, and the server wants to provide a scalable description, such 
that the good user is not jeopardized unnecessarily (see Fig. [B. 

It is also worth pointing out that Heegard and Berger showed when the scalable coding requirement is removed, 
the optimal encoding by itself is in fact naturally progressive from the bad user to the good one; as such, the Sl- 
scalable problem is expected to be more difficult than the SR-WZ problem, since the encoding order is reversed 
from the natural one. This difficulty is encapsulated by the fact that in the SR-WZ ordering the decoder with 
better SI is able to decode whatever message was meant for the decoder with worse SI and hence the first stage 
can be maximally useful. However, in the Sl-scalable problem an additional tension exists in the sense that the 
second-stage decoder will need extra information to disambiguate the information of the first stage. 

The problem is well understood for the lossless case. The key difference from the lossy case is that the 
quality of the side informations can be naturally determined by the value of H{X\Y). By the seminal work of 
Slepian and Wolf [9], H{X\Y) is the minimum rate of encoding X losslessly with side information Y at the 
decoder, thus in a sense a larger H(X\Y) corresponds to weaker side information. If H{X\Yi) < H(X\Y2), 
then the rate {Ri,R2) = {H{X\Yi), H{X\Y2) - H{X\Yi)) is achievable, as noticed by Feder and Shulman 
[10]. Extending this observation and a coding scheme in [11], Draper [12] proposed a universal incremental 
Slepian-Wolf coding scheme when the distribution is unknown, which inspired Eckford and Yu [13] to design 
rateless Slepian-Wolf LDPC code. For the lossless case, there is no loss of optimality by using a scalable coding 
approach; an immediate question is to ask whether the same is true for the lossy case in terms of rate distortion, 
which we will show to be not so in general. In this rate-distortion setting, the order of goodness by the value 
of H{X\Y) is not sufficient because of the presence of the distortion constraints. This motivates the Markov 
condition X <-> Yi 12 introduced for the Sl-scalable coding problem. Going further along this point of view, 
the Sl-scalable problem is also applicable in the single user setting, when the source encoder does not know 
exactly which side information the receiver has within a given set. Therefore it can be viewed as a special case 
of the side-information universal rate distortion coding. 

In this work, we formulate the problem of side information scalable source coding, and provide two inner 
bounds and two outer bounds for the rate-distortion region. One of the inner-bounds has the same distortion and 
rate expressions as one of the outer bounds, and they differ in the domain of optimization only by a Markov string 
requirement. Though the inner and the outer bounds do not coincide in general, the inner bounds are indeed tight 
for the case when either the first stage or the second stage requires a lossless reconstruction, as well as for the 
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case when certain deterministic distortion measures are taken. Furthermore, a conclusive result is given for the 
quadratic Gaussian source with any finite number of stages and arbitrary correlated Gaussian side informations. 

With this set of inner and outer bounds, the problem of perfect scalability is investigated, defined as when 
both of the layers can achieve the corresponding Wyner-Ziv bounds; this is similar to the notion of (strict) 
successive refinability in the SR-WZ problem [5] [6 jll Necessary and sufficient conditions are derived for general 
discrete memoryless sources to be perfectly scalable under a mild support condition. By using the tool of rate- 
loss introduced by Zamir [14], we further show that the gap between the inner bounds and the outer bounds 
are bounded by a constant when squared error distortion measure is used, and thus the inner bounds are "nearly 
sufficient", in the sense as given in [15]. 

In addition to the result for the Gaussian source, partial result is provided for the doubly symmetric binary 
source (DSBS) with Hamming distortion measure when the second stage does not have side information, for 
which the inner bounds and outer bounds coincide in certain distortion regimes. It is shown one of the outer 
bound can be strictly better than the other for this source. 

The rest of the paper is organized as follows. In Section JI] we define the problem and establish the notation. 
In Section JIIJ we provide inner and outer bounds to the rate-distortion region and show that the bounds coincide 
in certain special cases. The notion of perfectly scalable is introduced in Section |IV] together with the example 
of a binary source. The rate loss method is applied in Section |V] to show the gap between the inner bound and 
the outer bounds is bounded. In |Vll the Gaussian source is treated within a more general setting. We conclude 
the paper in Section IVIII 

II. Notation and Preliminaries 

Let A' be a finite set and let X'^ be the set of all n- vectors with components in X. Denote an arbitrary member 
of as x" = {xi,X2, ■ ■ ■ ,Xn), or alternatively as x. Upper case is used for random variables and vectors. A 
discrete memoryless source (DMS) {X,Px) is an infinite sequence {Xi}^^ of independent copies of a random 
variable X in X with a generic distribution Px with Px{x"') = JlILi Px{xi). Similarly, let {X ,yi,y2, PXY1Y2) 
be a discrete memoryless three-source with generic distribution PxYxY2 \ the subscript will be dropped when it is 
clear from the context as P{X, Yi, 12)- 

Let Xi and X2 be finite reconstruction alphabets. Let dj : X x Xj —>■ [0,00), j = 1,2 be two distortion 
measures. The single-letter distortion extension of dj to vectors is defined as 

1 " 

dj{x,x) = -'^dj{xi,Xi), VxgX'^, x£Xp, j = l,2. (1) 

1=1 

Definition 1: An (n, Mi, M2, Di, D2) rate distortion (RD) Sl-scalable code for source X with side information 

'in the rest of the paper, decoder one, respectively decoder two, will also be referred to as the first stage decoder, respectively second 
stage decoder, depending on the context. 
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{Yi,Y2) consists of two encoding functions cj)i and two decoding functions ipi, i = 1,2: 

: A"^^/a/,, <^2:A'"^/m„ (2) 

^1 : hi, X yi ^ , ^2 : /m, X Im, x 3^2" ^ ^'J', (3) 

where Ik = {1,2,..., k}, such that 

Edi(X",Vi(0i(^"),n"))< A, (4) 

E(i2(X",V'2(<^l(^"),<A2(^"),>T)) < D2, (5) 

where E is the expectation operation. 

Definition 2: A rate pair (iii,i?2) is said to be (Di, D2)-achievable for Sl-scalable encoding with side 
information {Yi,Y2), if for any e > and sufficiently large n, there exist an {n, Mi, M2, Di + e,D2 + e) 
RD Sl-scalable code, such that Ri + e > ^ log(Mi) and i?2 + e > ^ log(M2). 

Denote the collection of all the (Di, D2)-achievable rate pair {Ri,R2) for Sl-scalable encoding as TZ{Di, D2), 
and we seek to characterize this region when X ^ Yi ^ Y2 forms a Markov string (see similar but 
different degradedness conditions in [5], [6]). The Markov condition in effect specifies the goodness of the 
side informations. 

The rate-distortion function for degraded side-informations was estabUshed in [7] for the non-scalable coding 
problem. In light of the discussion in Section Jl it gives a lower bound on the sum-rate for any RD Sl-scalable 
code. More precisely, in order to achieve distortion Di with side information Yi, and achieve distortion D2 with 
side information Y2, when X ^ Yi Y2, the rate-distortion function is 

Rhb{Di,D2)= min [I{X;W2\Y2) + IiX;Wi\W2,Yi)], (6) 

p(Di,D2) 

where p{Di,D2) is the set of all random variable (VFi,VF2) G Wi x W2 jointly distributed with the generic 
random variables (X, 11,12) > such that the following conditions are satisfiecj^: (i) (M^i,VF2) <-> X <-> Yi <-> I2 
is a Markov string; (ii) Xi = fi{Wi,Yi) and X2 = /2(W^2;^2) satisfy the distortion constraints. Notice that the 
rate distortion function R{Di,D2) given above suggests an encoding and decoding order from the bad user to 
the good user. 

Wyner and Ziv [4] showed that under the following quite general assumption that the distortion measure is 
chosen in the set Td defined as 

Trf = {d{-, ■) : d{x, x) = 0, and d{x, x) > if 2; 7^ x}, (7) 

then the rate distortion function satisfies i?^|y(0) = H{X\Y), where i?Y|y(^) is the well-known Wyner-Ziv 
rate distortion function with side information Y. If the same assumption is made on the distortion measure 

^This form is slightly different from the one in [7] where /i was defined as /i(Wi, W2, Y), but it is straightforwardly to verify that 
they are equivalent. The cardinality bound is also ignored, which is not essential here. 
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di{-, •) G r^, then we can easily show (using an argument similar to the remark (3) in [4]) that 

Rhb{0,D2) = mm[I{X;W2\Y2) + H{X\W2,Y^)], (8) 

P{-D2) 

where p{D2) is the set of all random variable W2 such that W2 <-> X <-> Yi ^ 5^2 is a Markov string, and 
X2 = 72(^^2,^2) satisfies the distortion constraint. 

III. Inner and Outer Bounds 

To provide intuition into the the Sl-scalable problem, we first examine a simple Gaussian source under the 
mean squared error (MSE) distortion measure, and describe the coding schemes informally. 

Let X 7V(0, cj^) and Yi = Y = X + N, where N ~ AA(0, cr^) is independent of X; Y2 is simply a constant, 
i.e., no side information at the second decoder. X <-> Yi <-> I2 is indeed a Markov string. To avoid lengthy 
discussion on degenerate regimes, assume aj^ ?a a^, and consider only the following extreme cases. 

• ^ Di ^ D2: It is known binning with a Gaussian codebook, generated using a single-letter mechanism 
{i.e., as an i.i.d. product distribution of the single-letter form) as Wi = X + Zi, where Zi is a zero-mean 
Gaussian random variable independent of X such that Di = K[X — E,{X\Y, VFi)]^, is optimal for Wyner-Ziv 
coding. This coding scheme can still be used for the first stage. In the second stage, by direct enumeration 
in the list of possible codewords in the particular bin specified in the first stage, the exact codeword can be 
recovered by decoder two, who does not have any side information. Since ^ Di ^ D2, Wi alone is not 
sufficient to guarantee a distortion D2, i.e., D2 -C E[X— E(X|Tyi)]^. Thus a successive refinement codebook, 
say using a Gaussian random variable W2 conditioned on Wi such that D2 = E[X — E(X| Wi, W2)]^^ is 
needed. This leads to the achievable rates: 

Ri>I{X-Wi\Y), Ri + R2> I{X-Wi\Y) + I{Wi;Y) + I{X-W2\Wi) = I{X;Wi,W2). (9) 

. 0-2 > 1)2 » Di: If we choose Wi = X + Zi such that Di = E[X - E{X\Y, Wi)]"^ and use the coding 
method in the previous case, then since D2 ^ Di, Wi is sufficient to achieve distortion D2, i.e., D2 ^ 
K[X — K{X\Wi)]'^. The rate needed for the enumeration is I{Wi;Y), and it is rather wasteful since Wi 
is more than we need. To solve this problem, we construct a coarser description using random variable 
W2 = X + Zi + Z2, such that D2 = K[X - E{X\W2)]^. The encoding process has three effective layers 
for the needed two stages: (i) the first layer uses Wyner-Ziv coding with codewords generated by (ii) 
the second layer uses successive refinement Wyner-Ziv coding with Pwi\W2 (''') third layer enumerates 
the specific W2 codeword within the first layer bin. Note that the first two layers form a SR-WZ scheme 
with identical side information Y at the decoder. For decoding, decoder one decodes the first two layers 
with side information Y, while decoder two decodes the first and the third layer without side information. 
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By the Markov string X <-> Wi <-> W2, this scheme gives the following rates: 

Ri > I{X-Wi,W2\Y) = I{X;Wi\Y) 
R1 + R2 > I{X;Wi\Y) + I{W2;Y) = I{X;W2) + I{X;Wi\Y,W2). (10) 

It is seen in the above discussion the specific coding schemes depend on the distortion values, which is not 
desirable since this usually suggests difficulty in proving the converse. The two coding schemes can be unified 
into a single one by introducing an auxiliary random variable, as will be shown in the sequel, however, it appears 
the converse is indeed quite difficult to prove. 

In the rest of this section, inner and outer bounds for TZ{Di,D2) are provided. The coding schemes for the 
above Gaussian example are naturally generalized to give the inner bounds. It is further shown that the inner 
bounds are in fact tight for certain special cases. 

A. Two inner bounds 

Define the region lZin{Di, D2) to be the set of all rate pairs R2) for which there exist random variables 
{Wi, W2, V) in finite alphabets Wi, W2,V such that the following condition are satisfied. 

1) {Wi,W2, V) ^ X ^ Yi ^ Y2 is a Markov string. 

2) There exist deterministic maps fj : Wj x y,- — > Xj such that 

Ed,{X,f,{Wj,Yj))<D,, i = l,2. (11) 

3) The non-negative rate pairs satisfy: 

Ri>I{X-V,Wi\Yi), Ri + R2>IiX;V,W2\Y2)+I{X;Wi\Yi,V). (12) 

4) Wi ^ {X, V) ^ 1^2 is a Markov string. 

5) The alphabets V, Wi and W2 satisfy 

|V|<|^| + 3, iWil < I-^KIA-I +3) + l, IW2I < |^|(|;f| +3) + l. (13) 

The last two conditions can be removed without causing essential difference to the region Tlin{Di, D2); with 
them removed, no specific structure is required on the joint distribution of {X, V, Wi, ^2)- To see the last two 
conditions indeed do not cause loss of generality, apply the support lemma [11] as follows. For an arbitrary joint 
distribution of {X, V, Wi, W2) satisfying the first three conditions, we first reduce the cardinality of V. To preserve 
Px and the two distortions and two mutual information values, I^Yj + 3 letters are needed. With this reduced 
alphabet, observe that both the distortion and rate expressions depend only on the marginal of {X, V, Wi ) and 
{X,V,W2), respectively, hence requiring Wi <-> {X,V) ^ W2 being a Markov string does not cause any loss 
of generahty. Next to reduce the cardinality of Wi, it is seen |^||V| — 1 letters are needed to preserve the joint 
distribution of {X,V), one more is needed to preserve Di and another is needed to preserve I{X;Wi\Yi,V). 



A coarser bin 

A finer bin 

Fig. 2. An illustration of the codewords in the nested binning structure. 

Thus lA^KI,^! + 3) + 1 letters suffice. Note that we do not need to preserve the value of D2 and the value of the 
other mutual information term because of the aforementioned Markov string. A similar argument holds for jyV2|. 
The following theorem asserts that 7lin{Di, D2) is an achievable region. 

Theorem 1: For any discrete memory less stochastic source with side informations under the Markov condition 

X ^Yi^ Y2, 

n{Di,D2)^n,n{Di,D2). 

This theorem is proved in Appendix Ull and here we outline the coding scheme for this achievable region in an 
intuitive manner. The encoder first encodes using a V codebook with a "coarse" binning, such that decoder one 
is able to decode it with side information Yi . A Wyner-Ziv successive refinement coding (with side information 
Yi) is then added conditioned on the codeword V also for decoder one using Wi. The encoder then enumerates 
the binning of V up to a level such that V is decodable by decoder two using the weaker side information Y2. 
By doing so, decoder two is able to reduce the number of possible codewords in the (coarse) bin to a smaller 
number, which essentially forms a "finer" bin; with the weaker side information Y2, the V codeword is then 
decoded correctly with high probability. Another Wyner-Ziv successive refinement coding (with side information 
Y2) is finally added conditioned on the codeword V for decoder two using a random codebook of W2. 

As seen in the above argument, in order to reduce the number of possible V codewords from the first stage 
to the second stage, the key idea is to construct a nested binning structure as illustrated in Fig. |2l Note that this 
is a fundamentally different from the code structure in SR-WZ, where no nested binning is needed. Each of the 
coarser bin contains the same number of finer bins; each finer bin holds certain number of codewords. They are 
constructed in such a way that given the specific coarser bin index, the first stage decoder can decode in it with 
the strong side information; at the second stage, additional bitstream is received by the decoder, which further 
specifies one of the finer bin in the coarser bin, such that the second stage decoder can decode in this finer bin 
using the weaker side information. If we assign each codeword to a finer bin independently, then its coarser bin 
index is also independent of that of the other codewords. 

We note that the coding scheme does not expUcitly require that side informations are degraded. Indeed as long 
as the chosen random variable V satisfies I(y;Yi) > I{V]Y2) as well as the Markov condition, the region is 
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indeed achievable. More precisely, the following corollary is straightforward. 

Corollary 1: For any discrete memory less stochastically source with side informations Yi and Y2 (without the 
Markov structure), TZin{Di, D2) ^ Tl{Di, D2), where TZin{Di, D2) is TZin{Di, D2) with the additional condition 
thai I{V-Yi) > I {V-Y2). 

We can specialize the region TZin{Di, D2) to give another inner bound. Let iZin{Di, D2) be the set of all 
rate pairs (iii,i?2) for which there exist random variables (VFi,VF2) in finite alphabets Wi,yV2 such that the 
following condition are satisfied. 

1) Wi ^ W2 ^ X ^Yi ^Y2 or W2 ^ Wi ^ X ^Yi ^Y2 is a Markov string. 

2) There exist deterministic maps fj : Wj x y,- — Xj such that 

Edj{X,fj{Wj,Yj))<Dj, i = l,2. (14) 

3) The non-negative rate pairs satisfy: 

Ri>IiX;Wi\Yi), Ri + R2>IiX;W2\Y2) + IiX;Wi\Yi,W2). (15) 

4) The alphabets Wi and W2 satisfy 

|Wi| <(|;f| + 3)(|^|(|A'|+3) + l), iWil <(|;f| + 3)(|^|(|A'|+3) + l). (16) 

Corollary 2: For any discrete memoryless stochastically source with side informations under the Markov 
condition X <^ Yi <^ Y2, 

nin{Dl,D2)^^^^n{Dl,D2). 

The region 7^j„(Di, D2) is particular interesting for the following reasons. Firstly, it can be explicitly matched 
back to the coding scheme for the simple Gaussian example. Secondly, it will be shown that one of the 
outer bounds has the same rate and distortion expressions as TZiniDi, D2), only with a relaxed Markov string 
requirement. We now prove this corollary. 

Proof of Corollary \2\ 

When Wi W2 <-> X, let V = Wi. Then the rate expressions in Theorem [T] gives 

Ri>I{X-Wi\Yi), Ri + R2>I{X-V,W2\Y2) + I{X-Wi\V,Yi) = I{X-W2\Y2), (17) 

and therefore 7^i„(Dl, 1)2) 5 7ti„(L'i, D2) for this case. When W2 ^ Wi ^ X, let V = W2. Then the rate 
expressions in Theorem [T] gives 

Ri > I{X;V,Wi\Yi) = I{X-Wi\Yi) 
R1 + R2 > IiX;V,W2\Y2) + IiX;Wi\V,Yi) = I{X;W2\Y2) + I{X;Wi\W2,Yi), 

and therefore Tlin{Di, D2) 5 ^in{Di, D2) for this case. 
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The cardinality bound here is larger than that in Theorem [T] because of the requirement to preserve the Markov 
conditions. ■ 

B. Two outer bounds 

Define the following two regions, which will be shown to be two outer bounds. An obvious outer bound is 
given by the intersection of the Wyner-Ziv rate distortion function and the rate-distortion function for the problem 
considered by Heegard and Berger [7] with degraded side information X Yi <-> 1^2 

nr,{Di,D2) = {{Ri,R2):Ri>R*x\Y,iDi), Ri + R2 > Rhb{Di, D2)}. (18) 

A tighter outer bound is now given as follows: define the region 7lout{Di, D2) to be the set of all rate pairs 
(i?i,i?2) for which there exist random variables (VFi,T4^2) in finite alphabets >Vi,yV2 such that the following 
conditions are satisfied. 

1) (1^1, W2) ^Yi^Y2. 

2) There exist deterministic maps fj : Wj x yj — > Xj such that 

Edj{X,fj{Wj,Yj))<Dj, i = l,2. (19) 

3) \m\ < + 3) + 2, IW2I < \X\ + 3. 

4) The non-negative rate vectors satisfies: 

Ri>IiX;Wi\Yi), Ri + R2>IiX;W2\Y2)+IiX;Wi\Yi,W2). (20) 
The main result of this subsection is the following theorem. 

Theorem 2: For any discrete memoryless stochastically source with side informations under the Markov 
condition X <^ Yi <^ Y2, 

nn{Di,D2) 2 TZout{Di,D2) 5 n{Di,D2). 

The first inclusion of TZr,{Di, D2) 5 Tlout{Di, D2) is obvious, since TZoutiDi, D2) takes the same form as 
i?^|y (Di) and Rhb{Di, D2) when the rates Ri and Ri + R2 are considered individually. Thus we will focus 
on the latter inclusion, whose proof is given in Appendix HHI 

Note that the inner bound TZin{Di, D2) and TZout{Di, D2) have the same rate and distortion expressions and 
they differ only by a Markov string requirement (ignoring the non-essential cardinality bounds). Because of the 
difference in the domain of optimizations, the two bounds may not produce the same rate-regions. This is quite 
similar to the case of distributed lossy source coding problem, for which the Berger-Tung inner bound requires a 
long Markov string and the Berger-Tung outer bound requires only two short Markov strings [16], but their rate 
and distortion expressions are the same. 
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C. Lossless reconstruction at one decoder 

Since decoder one has better quality side information, it is reasonable for it to require a higher quality 
reconstruction. Alternatively, from the point of view of universal coding, when the encoder does not know 
the quality of the side information, it might assume the better quality one exists at the decoder and aim to 
reconstruct with a higher quality, comparing with the case when the poorer quality side information is available. 
In the extreme case, decoder one might require a lossless reconstruction. In this subsection, we consider the 
setting where either decoder one or decoder two requires lossless reconstruction. We have the following theorem. 

Theorem 3: If L»i = with di{-,-) € T^, or L»2 = with d2{-,-) G (see |7] for F^), then lZ{Di,D2) = 
T^iniDi, D2). More precisely, for the former case, 

7^(0,Z)2)= U {{Ri,R2):Ri>H{X\Yi), + R2 > I{X-W2\Y2) + H{X\Yi,W2).}, (21) 

where {D2) is the set of random variables satisfying the Markov string W2 <-> X w <-> Y2, and having a 
deterministic function /2 satisfying Kd{f2{W2,Y2), X) < D2. For the latter case, 

7^(I?l,0)= U {{Ri,R2):Ri>I{X-Wi\Yi), R^ + R2 > H{X\Y2)}, (22) 

where P\y^ (Di) is the set of random variables satisfying the Markov string Wi <-> X <-> Yi <-> Y2, and having a 
deterministic function /i satisfying Ed{fi{Wi,Yi), X) < Di. 

Proof of Theorem\3}[ For Di = 0, let Wi = X and V = W2- The achievable rate vector implied by Theorem [T] 
is given by 

Ri>H{X\Yi), Ri + R2>I{X;W2\Y2) + H{X\Yi,W2). (23) 

It is seen that this rate region is tight by the converse of Slepian-Wolf coding for rate Ri, and by ([8]) of Heegard- 
Berger coding for rate i?i + i?2- 

For D2 = 0, let Wi = V and W2 = X. The achievable rate vector implied by Theorem [T] is given by 

Ri>IiX;Wi\Yi), Ri + R2>H{X\Y2). (24) 

It is easily seen that this rate region is tight by the converse of Wyner-Ziv coding for rate and the converse 
of Slepian-Wolf coding (or more precisely, Wyner-Ziv rate distortion function Rx\y2{^) with d2{-,-) € as 
given in [4]) for rate Ri + R2. ■ 
Zero distortion under a distortion measure d G can be interpreted as lossless, however, it is a weaker 
requirement than that the block error probability is arbitrarily small. Nevertheless, 7?.(0, 1^2) and 7?.(Z)i,0) in 
(I2T]) and (I22I ) still provide valid outer bounds for the more stringent lossless definition. On the other hand, it is 
rather straightforward to specialize the coding scheme for these cases, and show that the same conclusion is true 
for lossless coding in the this case. Thus we have the following corollary. 
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Corollary 3: The rate region, when the first stage, and respectively the second stage, requires lossless in terms 
of arbitrary small block error probability is given by (|2T]) . respectively (l22l) . 

The key difference from the general case when both stages are lossy is the elimination of the need to generate 
one of codebooks using an auxiliary random variables, which simplifies the matter tremendously. For example 
when D2 = 0, since the first stage encoder guarantees that wi and x are jointly typical, the second stage only 
needs to construct a codebook of x by binning the approximately 2^(^l'^i) such x vector directly. Subsequently 
the second stage encoder does not search for a vector x* to be jointly typical with both wi and x, but instead 
just sends the bin index of the observed source vector x directly. Alternatively, it can be understood as both the 
encoder and decoder at the second stage have access to a side information vector wi, and thus a conditional 
Slepian-Wolf coding with decoder side information Y2 suffices. 



D. Deterministic distortion measure 

Another case of interest is when some functions of the source X is required to be reconstructed with arbitrary 
small distortion in terms of Hamming distortion; see [17] for the corresponding case for the multiple description 
problem. More precisely, let : Af — > Zj, z = 1, 2 be two deterministic functions and denote Zi = Qi{X). 
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The 



Consider the case that decoder i seeks to reconstruct Zi with arbitrarily small Hamming distortion 
achievable region TZm is tight when the functions satisfy certain degradedness condition as stated in the following 
theorem. 

Theorem 4: Let the distortion measure be Hamming distortion dn '■ Zi x Zi {0, 1} for i = 1,2. 

1) If there exists a deterministic function Q' : Zi ^ Z2 such that Q2 = Q' ■ Qi, then 7^.(0,0) = 7^in(0, 0). 
More precisely 

7^(0,0) = {{Ri,R2) ■■ Ri > H{Zi\Yi), Ri+R2> H{Z2\Y2) + H{Z^\YiZ2)} . (25) 

2) If there exists a deterministic function Q' : Z2 Z\ such that Qi = Q' ■ Q2, then 7^(0,0) = 7?.j„(0, 0). 
More precisely 

n{0,0) = {{Ri,R2) : Ri > H{Zi\Yi), Ri + R2 > H{Z2\Y2)} . (26) 

Proof of Theorem^ To prove ( [25] ). first observe that by letting Wi = Z\ and V = W2 = Z2, TZin clearly reduces 
to the given expression. For the converse, we start from the outer bound Tlout{0, 0), which implies that Zi is a 
function of Wi and Yi, and Z2 is a function of W2 and Y2. For the first stage rate Ri, we have the following 
chain of equalities 

Ri > I{X;Wi\Yi) = I{X;WiZi\Y^) > I{X;Zi\Yi) = H{Zi\Yi) - H{Zi\X,Yi) = H{Zi\Yi). (27) 



'By a similar argument as in the last subsection, the same result holds if block error probability is made arbitrarily small. 
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For the sum rate, we have 



R1 + R2 > 



I{X;W2\Y2) + I{X;Wi\W2Yi) 
I{X- W2Z2\Y2) + 1{X; Wi \W2Y1) 
IiX-Z2\Y2)+I{X;W2\Y2Z2)+I{X;Wi\W2Yi) 
H{Z2\Y2) + I{X- W2\Y2Z2) + /(X; Wx\W2Yx) 



(a) 



> 



H{Z2\Y2) + /(X; W2\YxY2Z2) + /(X; l^i|M^2n) 
F(Z2 1^2) +/(X; 1^2 |n>2 ^2) + /(X; ^111^2^2) 
^(^21^2) + /(X; W^2|n>2^2) + /(X; ^i|t^2ni2^2) 



> 



H{Z2\Y2) ^ l(X;WxW2\YxY2Z2) 
H{Z2\Y2)+I{X-Zi\YiY2Z2) 



H{Z2\Y2) + H{Zi\YiY2Z2) 
H{Z2\Y2) + H{Zi\YiZ2), 



where (a) is due to the Markov string W2 ^ X ^ (^1^2) and Z2 is function of X; (b) is due to the Markov 
string {W1W2) ^ X ^Yi^Y2; (c) is due to the Markov string (Zi, Z2) ^ Yi ^ Y2. 



Clearly in the converse proof, the requirement that the functions Qi and Q2 are degraded is not needed. 
Indeed this outer bound holds for any general functions, however the degradedness is needed for establishing the 
achievability of the region. If the coding is not necessarily scalable, then it can be seen the sum rate is indeed 
achievable, and the result above can be used to establish a non-trivial special result in the context of the problem 
treated by Heegard and Berger [7]. 

Corollary 4: Let the two function Qi and Q2 be arbitrary, and let the distortion measure be Hamming distortion 
dn '■ Zi X Zi ^ {0, 1} for i = 1, 2, then we have 



In this section we introduce the notion of perfect scalability, which is defined as when both the stages operate at 
the Wyner-Ziv rates. We further examine the doubly symmetric binary source and provide a partial characterization 
and investigate its scalability. The quadratic Gaussian source with jointly Gaussian side informations is treated 
in Section |Vl] in a more general setting. 

A. Perfect Scalability 

The notion of the (strict) successive refinabihty defined in [5] for the SR-WZ problem with forward degradation 
in the side-informations (SI) can be applied to the reversely degraded case considered in this paper. This is done 



Proof of part 2) {i.e., (l26l ) relationship) is straightforward and is omitted. 



Rhb{Q,Q) = H{Z2\Y2)+H{Zi\YiZ2). 



(28) 



IV. Perfect Scalability and a Binary Source 
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by introducing the notion of perfect scalability for the Sl-scalable problem defined below. 

Definition 3: A source X is said to be perfectly scalable for distortion pair (Di, D2), with side informations 
under the Markov string X li <-> l2> if 

Theorem 5: A source X with side informations under the Markov string X ^¥2, for which 3 yi G 

such that PxYi{x,yi) > for each x £ X, is perfectly scalable for distortion pair {Di,D2) if and only if there 
exist random variables {Wi, W2) and deterministic maps fj : Wj x Xj such that the following conditions 

hold simultaneously: 

1) R\^y^{Dj) = I{X-Wj\Yj) and mj{XJj{Wi,Yj)) < Dj, for j = 1,2. 

2) Wi <-> W2 <-> X ^ Yi <-> I2 forms a Markov string. 

3) The alphabet Wi and W2 satisfy < l^f |(|-^| + 3) + 2, and IW2I < \X\ + 3. 

The Markov string is the most crucial condition, and the substring Wi <-> W2 ^ X is the same as one of the 
condition for successive refinability without side information [2] [3]. The support condition essentially requires 
the existence of a worst letter yi in the alphabet such that it has non-zero probability mass for each 
pair, X £ X. 

Proof of Theorem |5] 

The sufficiency being trivial, we only prove the necessity. Without loss of generality, assume Px{x) > 
for all X € X. By Theorem |2l if {R*^^yS^'^'>' ^*x\Y2^^'^^ ~ ^*x\yS^^) achievable for (L»i,L>2), then using 
the tighter outer bound 7lout{Di, D2) of Theorem |2j there exist random variable Wi,W2 in finite alphabet, 
whose sizes is bounded as |yViI < |<^|(j^|+3) + 2 and |yV2j < + 3, and functions /i,/2 such that 
{Wi,W2) ^ X ^ Yi ^ Y2 is a Markov string, Edj{X, fj{Wj,Yj)) < Dj for j = 1, 2 and 

R*xiYADi)>IiX;Wi\Yi), R*j,^yjD2) > I{X:W2\Y2) + I{X;Wi\Yi,W2). (29) 

It follows 

R*x\yAD2) > I{X-W2\Y2)+I{X;Wi\Yi,W2) > I{X;W2\Y2) > iix|y.p2), (30) 

where (a) follows the converse of rate-distortion theorem for Wyner-Ziv coding. Since the leftmost and the 
rightmost quantities are the same, all the inequalities must be equalities in ( [30l ). and it follows I{X; Wi | Yi, W2) = 
0. Similarly we have 

R*x\ySDi) > IiX;Wi\Yi) > R*x\y{Di), (31) 

thus (|3T| ) also holds with equality. 

Notice that if Wi ^ W2 ^ X is a. Markov string, then we can use Corollary |2] to claim the sufficiency and 
complete the proof. However, this Markov condition is not true in general. This is where the support condition 
is needed. 
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For convenience, define the set 

F{w2) = {xeX : P{x, W2) > 0}. (32) 
By tiie Markov string (Wi, W2) ^ X ^ Yi, the joint distribution of {wi,W2,x, yi) can be factorized as follows 

P{wi,W2,x,yi) = P{x,yi)P{w2\x)P{wi\x,W2). (33) 

Furthermore, I{X;Wi\Yi,W2) = implies the Markov string X ^ (W2,Yi) ^ Wi, and thus the joint 
distribution of {wi,W2,x,yi) can also be factorized as follows 

P{wi,W2,x,yi) = P{x,yi,W2)p{wi\yi,W2) = P{x,yi)P{w2\x)P{wi\yi,W2), (34) 

where (a) follows by the Markov substring W2 ^ X <-> Yi ^ l2- Fix an arbitrary {wl,W2) pair, by the 
assumption that P{x,yi) > for any 2; G ^, we have 

P{w2\x)P{wl\x,W2) = P{w2\x)P{wl\yi,W2) (35) 
for any x £ X. Thus for any x G F{w2) (see definition in (l32l l) such that P{wi\x,W2) is well defined, we have 

p{wl\yi,W2) =p{wl\x,W2) (36) 

and it further implies 

pKK) = = E.Pi-,-^*2) =p{w,\y„W2)=p{w,\x,u^2) (37) 

for any x G F{w2)- This indeed implies Wi ^ W2 <-> X is a Markov string, which completes the proof. ■ 

B. The Doubly Symmetric Binary Source with Hamming Distortion Measure 

Consider the following source: X is a memoryless binary source X G {0, 1} and P{X = 0) = 0.5. The first 
stage side information Y can be taken as the output of a binary symmetric channel with input X, and crossover 
probability p < 0.5. The second stage does not have side information. This source clearly satisfies the support 
condition in Theorem \5\ It will be shown that for some distortion pairs, this source is perfectly scalable, while 
for others this is not possible. We next first provide partial results using TZin and Tin previously given. 

An explicit calculation of Rhb{Di, D2), together with the optimal forward test channel structure, was given 
in a recent work [6]. With this explicit calculation, it can be shown that in the shaded region in Fig. [3l the outer 
bound TZniDi, D2) is in fact achievable (as well as in Region II, III and IV; however these three regions are 
degenerate cases, and will be ignored in what follows). Recall the definition of the critical distortion dc in the 
Wyner-Ziv problem for the DSBS source in [4] 

dc-p 

where G{u) = hh{p *u) — hh{u), hi,{u) is the binary entropy function /if,(n) = —ulogu — (1 — n) log(l — u), 
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Region l-Cl 



Region IV 



Region 



(0,0) D_ p 1.0 

Fig. 3. The partition of the distortion region, where dc is the critical distortion in [4] below which time sharing is not necessary. 
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► 








Fig. 4. The forward test channel in Region I-D. The crossover probability for the BSC between X and Wi is Di, while the crossover 
probability r) for the BSC between Wi and W2 is such that Di * rj = D2. 



and u*v is the binary convolution for < u, v < 1 as u * v = u{l — v) + v{l — u). It was shown in [4] that if 
D < dc, then R\^y{D) = G{D). We will use the following result from [6]. 

Theorem 6: For distortion pairs {Di,D2) such that < D2 < 0.5 and < Di < m.m{dc,D2) {i.e., Region 
I-D), 

Rhb{Di,D2) = l-h{D2*p) + G{Di). 
This result imphes that for the shaded region I-D, the forward test channel to achieve this lower bound is in 
fact a cascade of two BSC channels depicted in Fig. ID This choice clearly satisfies the condition in Corollary |2] 
with the rates given by the outer bound TZf^{Di, D2), which shows that this outer bound is indeed achievable. 
Note the following inequality 

Rhb{Di,D2) = 1 - hbiD2*p) + h{p * Di) - hb{Di) > 1 - /i^ps) = ii(I?2), (38) 

where the inequality is due to the monotonicity of G{u) in < n < 0.5, we conclude that in this regime the 
source is not perfectly scalable. 

To see Tlf^{Di, D2) is also achievable in region I-C, recall the result in [4] that the optimal forward test 
channel to achieve i2^|y(L') has the following structure: it is the time-sharing between zero-rate coding and a 
BSC with crossover probability dc if D > dc, or a single BSC with crossover probability D otherwise. Thus it is 
straightforward to verify that 7ln{Di, D2) is achievable by time sharing the two forward test channels in Fig. [51 
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BSC 
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► 



(b) 

Fig. 5. The forward test channels in Region I-C. The crossover probability for the BSC between X and W2 is D2 in both the channels, 
while the crossover probability rj for the BSC between W2 and Wi in (a) is such that D2 < D\ *rj = rj' < dc. Note for (b), Wi can be 
taken as a constant. 



Fig. 6. The rate outer bounds for a particular choice of Di , D2 in Region I-B of Figure [5] 



furthermore, an equivalent forward test channel can be found such that the Markov condition W[ <-> W2 X is 
satisfied, which satisfies the conditions given in Theorem |5] Thus in this regime, the source is in fact perfectly 
scalable. 

Unfortunately, we were not able to find the complete characterization for the regime I- A and I-B. Using 
an approach similar to [6], an exphcit outer bound can be derived from TZout{Di, D2). It can then be shown 
numerically that for certain distortion pairs in this regime, 7lout{Di, D2) is strictly tighter than lZr\{Di, D2). 
This calculation can be found in [18] and is omitted here. An example is given in Fig. [6] for the two outer bounds 
with a non-zero gap in between for a specific distortion pair in Region I-B. 



V. A Near Sufficiency Result 

By using the tool of rate loss introduced by Zamir [14], which was further developed in [15], [19]-[21], it 
can be shown that when both the source and reconstruction alphabets are reals, and the distortion measure is 
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\ 




Not achievable 



R 



Fig. 7. An illustration of the gap between the inner bound and the outer bounds when MSB is the distortion measure. The two regions 
TZini^Di, D2) and TZout{Di, D2) are given in dashed lines, since it is unknown whether they are indeed the same. 

MSE, the gap between the achievable region and the out bounds are bounded by a constant. Thus the inner and 
outer bounds are nearly sufficient in the sense defined in [15]. To show this result, we distinguish the two cases 
Di > D2 and Di < D2. The source X is assumed to have tinite variance a"^ and finite (differential) entropy. 
The result of this section is summarized in Fig. [T] 



A. The case Di > D2 

Construct two random variable W{ = X + Ni + N2 and = X + N2, where A'^i and are zero mean 
independent Gaussian random variables, independent of everything else, with variance erf and (t^ such that 
(jf + o"! = Di and o"! = D2. By letting V' = W[, it is obvious that the following rates are achievable for 
distortion {Di,D2) from Theorem [T] 



Ri = I{X- X + Ni + iVsin), Ri + R2 = I{X- X + N2\Y2). 



(39) 



Let U be optimal random variable to achieve the Wyner-Ziv rate at distortion Di given decoder side information 
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Yi- Then it is clear that the difference between Ri and the Wyner-Ziv rate can be bounded as, 

IiX;X + Ni + N2\Yi) - I{X;U\Yi) 
= I{X- X + Ni + N2\UYi) - I{X; U\YuX + Ni + N2) 

< IiX;X + Ni + N2\UYi) 

= liX -Xi;X -Xi + Ni + N2\UYi) 

< liX -Xi,U,Yi;X -X1+N1+N2) 

= I{X -Xi;X -X1 + N1 + N2) + I{U, Yi;X -Xi + Ni + N2\X - Xi) 

= liX -Xi;X -X1+N1+N2) 
(b) 1 Di + Di 

< ^log2 =0.5 (40) 

where (a) is by applying chain rule to I{X; X + Ni + N2, U\Yi) in two different ways; {b) is true because Xi 
is the decoding function given {U,Yi), the distortion between X and Xi is bounded by Di, and X — Xi is 
independent of (A'^i,A''2). 



Now we turn to bound the gap for the sum rate Ri + R2. Let Wi and W2 be the two random variables to 
achieve the rate distortion function Rhb{Di, 02). First notice the following two identities due to the Markov 
string {Wi,W2) ^ X ^ Yi ^ Y2 and (iVi, iV2) are independent of {X, Yi, ^2) 

I{X-W2\Y2)+IiX;Wi\W2Yi) = I{X;WiW2\Yi) + I{Yi;W2\Y2) (41) 
I{X;X + N2\Y2) = IiX;X + N2\Yi) + I{Yi;X + N2\Y2). (42) 

Next we can bound the difference between the sum-rate Ri + R2 (as given in (l39l) ) and the Heegard-Berger sum 
rate as follows. 

I{X;X + N2\Y2) - I{X;W2\Y2) - I{X;Wi\W2Yi) 
= {I{X;X + N2\Yi)-I{X;WiW2\Yi)} + {I{Yr,X + N2\Y2)-I{Yi;W2\Y2)}. (43) 
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To bound the first bracket, notice that 

IiX;X + N2\Yi) - I{X;WiW2\Yi) 
= I{X;X + N2\WiW2Yi) - I{X;WiW2\Yi,X + N2) 

< I{X;X + N2\WiW2Yi) 
= IiX;X + N2\WiW2YiY2) 

= I{X -X2;X-X2 + N2\WiW2YiY2) 

< liX -X2,Wi,W2,Yi,Y2;X -X2 + N2) 

= I{X -X2;X-X2 + N2) + I{Wi, W2, Yi, I2; X-X2 + N2\X- X2) 

= I{X -X2;X -X2 + N2) < ^log2 ^';t^' =0-^ (44) 

Z U2 

where (a) is due to the Markov string {Wi, W2) <-> X ^ Yi ^ ^2 is the decoding function given (^1^2,^2), 
and the other inequaUties follow similar arguments as in Eqn. (l40l) . To bound the second bracket, we write the 
following 



I{Yr,X + N2\Y2) - I{Yi-W2\Y2) 

= I{Yi-X + N2\W2Y2)-I{Yi-W2\Y2,X + N2) 

< I{Yi;X + N2\W2Y2) 

< I{XYi;X + N2\W2Y2) 

= I{X- X + N2\W2Y2) < I log2 = 0.5 (45) 

2 JJ2 

Thus we have shown that for Di > D2, the gap between the outer bound 7ln{Di, D2) and the inner bound 
TZiniDi, D2) is bounded. More precisely, the gap for Ri is bounded by 0.5 bit, while the gap for the sum rate 
is bounded by 1.0 bit. 



B. The case Di < D2 

Construct random variable W{ = X+Ni and W2 = X +N1+N2, where A^^i and are zero mean independent 
Gaussian random variables, independent of everything else, with variance af and cri such that af = Di and 
erf + o"2 = i?2- By letting V' = W2 = X + A^^i + A''2, it is easily seen that the following rates are achievable for 
distortion (Di, 1)2) 

Ri = I{X;X + Ni\Yi) 
R1+R2 = I{X;X + Ni+N2\Y2)+IiX;X + Ni\Yi,X + Ni + N2). 

Clearly, the argument for the first stage Ri still holds with minor changes. To bound the sum-rate gap, notice 
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the following identity 

liX; X + Ni+ N2\Y2) + /(X; X + Ni\Yi, X + Ni + N2) 
= IiX;X + Ni + N2\Yi) + IiYi;X + Ni+N2\Y2) + IiX;X + Ni\Yi,X + N1 + N2) (46) 
= IiYi;X + Ni+N2\Y2) + IiX;X + Ni\Yi). (47) 

Next we seek to upper bound the following quantity 

I{X; X + Ni + N2\Y2) + I{X- X + Ni\Yi, X + Ni + N2) - I{X; 1^21^2) - I{X; Wi\W2Yi) 

= {I{X;X + iVilYi) - I{X;WiW2\Yi)} + {I{Yr,X + iVi + iYsjys) - I{Yi;W2\Y2)}, (48) 

where again Wi, W2 are the R-D optimal random variables for Rhb{Di, D2). For the first bracket, we have 

I{X-X + Ni\Y^) - I{X;WiW2\Y^) 
= I{X-X + Ni\WiW2Yi) - I{X]WiW2\Yi,X + Ni) 

< I(X;X + Ni\WiW2Yi) 

= I{X -Xi;X -Xi + N2\WiW2Yi) 

< liX - XuWi,W2,Yi;X - Xi + N2) 

= I{X -Xi;X -Xi + Ni) + I{WuW2,Yi;X - Xi + Ni\X - Xi) 

= I{X-Xr,X-Xi + Ni) 

where Xi is the decoding function given {Wi,Yi). For the second bracket, following a similar approach as (|45] ). 
we have 

liYi-X + iVi + N2\Y2) - IiYi;W2\Y2) 

< I{X;X + Ni + N2\W2Y2) 

< liX -X2,W2,Y2;X -X2 + N1 + N2) 
= I{X -X2;X-X2 + Ni + N2) < 0.5 

Thus we conclude that for both cases the gap between the inner bound and the outer bound is bounded. Fig. |7] 
illustrates the inner bound and outer bounds, as well as the gap in between. 

VI. The Quadratic Gaussian Source with Jointly Gaussian Side Informations 

The degraded side information assumption, either X Yi ^ Y2 or X Y2 ^ Yi, for the quadratic jointly 
Gaussian case is especially interesting, since physically degradedness and stochastic degradedness [22] do not 
cause essential difference in terms of the rate-distortion region for the problem being considered [5]. Moreover, 
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jointly Gaussian source-side information is always statistically degraded, these forwardly and reversely degraded 
cases together provide a complete solution to the jointly Gaussian case with two decoders. 

In this section we in fact consider a more general setting with an arbitrary number of decoders for jointly 
Gaussian source and multiple side informations. Though the source and side informations can have arbitrary 
correlation, in light of the discussion above, we will treat only physically degraded side informations. Note that 
since a specific encoding order is specified, though the side informations are degraded as an unordered set, the 
quality of side informations may not be monotonic along the scalable coding order. Clearly the solution for the 
two stage case can be reduced in a straightforward manner from the general solution. Recall from Theorem |2] 
(see (fTSl l) that TZn{Di, D2) is an outer bound derived from the intersection of the Heegard-Berger and Wyner-Ziv 
bounds. The generalization of the outer bound TZniDi, D2) to N decoders plays an important role, and therefore 
we take a detour in Section IVI-AI to start with the characterization of Rhb{Di, D2, . . . ,Dn) for the jointly 
Gaussian case. 



A. Rhb{Di, D2, . . . ,D]y) for the jointly Gaussian case 



Consider the following source X ~ M{0, cj^), and side informations = ^+Z]i=i where Ni M{0, af) 
are mutually independent and independent of X. The result by Heegard and Berger [7] gives 

N 

Rhb{Di,D2,...,Dn) = min S2l{X;Wk\Yk,Wk+i,Wk+2, ■ ■ ■ ,Wn), (50) 

where p{Di, D2, ■ ■ ■ , D^) is the set of all random variable with the Markov string (VFi, W2, . . . , Wn) <-> X <-;• 
(Yi, ^2, • • • , ^at), such that deterministic functions fk{Yk, Wk, Wk+i, • • • , Wn), k = 1,. . . ,N exist which satisfy 
the distortion constraints. In [6], the case N = 2 was calculated explicitly, however such an explicit calculation 
appears quite involved for general N due to the discussion of various cases when some of the distortion constraints 
are not tight. In the sequel we approach the problem by showing a jointly Gaussian forward test channel is optimal. 

Note that if we choose to enforce only a subset of the distortion constraints, the rate for such a restriction gives a 
lower bound on Rhb{Di, D2, . . . , Dj\[). By taking all the non-empty subsets of the distortion constraints, labeled 
by elements of 1^ = {1,2, . . . , N}, a total of 2^ — 1 lower bounds are available and clearly the maximum of 
them is also a lower bound. More precisely, we are interested in maxi2|^^(^/)), where Ad C Ij^ and R'^q{A£)) 
is defined in the sequel explicitly in terms of the distortion constraints only; note that if i £ Ad, Di is still the 
distortion constraint for the decoder with side information Yi. We next derive one of these lower bounds using 
all the constraints {Di, D2, ■ ■ ■ , Dn), i.e. Ad = In', a similar derivation applies to the case with any subset 
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An c In- Using (|50l ) we have, 

iV 

J] I{X- Wk\Yk, Wk+i, Wk+2, ■■■,Wn) 
k=i 

= h{X\YN) - h{X\YiWf) - h{X\YNWN) + h{X\YN-iWN) 

-h{X\YN-iWff_i) + ... + hiXlY^Wi") 
'-^ h{X\YN) - HXlY^Wf") 

-[h{X\YNWN) - HXIYn-iYnWn)] - ... - [h{X\Y2Wi') - h{X\YiY2Wi')] 
= h{X\YN) - h{X\YiWf) - I{X;Yn-i\YnWn) 

-I{X;Yn-2\Yn^iW^.i) -...-IiX;Yi\Y2Wi') 
'-^ h{X\YN) - h{X\YiWf) 

-[h{YN^i\YNWN) - h{YN-i\XYN)] "... - [h{Yi\Y2Wi') - h{Yi\Y2X)] 

N N 

= h{X\YN)+Y,hiYk^i\XYk) -Y,HYk~i\YkW^) - h{X\Yi,W('), 

k=2 k=2 

where (a) is because of the Markov string X ^ (Yk-iW^) ^ Yj^, and (b) is because of the Markov string 
^ {XYk) <-> ^fc-i, both of which are consequences of <-> X <-> Yfc-i ^ Yfc- The first two terms 
depend only on the source and distribution Pxy^.y^^ ^^d we now seek to bound the latter two terms, for which 
we have 

h{X\YiW^) = h{X -E{X\YW^)\YW^) < h{X -E{X\YWf^)) < h{M{0,Di)) = ^ log(27reL>i), (51) 

where the second inequality is because Gaussian distribution maximizes the entropy for a given second moment, 
and E(X - E{X\YWf))'^ < Di by the existence of the decoding function fi. Next define 

v^fc — 1 2 

7k = ^^^^,k = 2,3,...,N. (52) 

and write the following 

k-l k-1 k k 

Yk-i = X + Y,N^ = X + Y,Ni + -^kY,^i-^kY,^i (53) 

i=l i=l j=l i=l 

k k-1 k 

= 7fc(X + ^iV,) + (l-7fc)X + [J^iVi-7fc J^AT.] (54) 

1=1 i=l i=l 

k-1 k 

= ^kYk + {l-lk)X + [Y,Ni-^kY.Ni] (55) 

i=l i=l 
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Notice that 

fc-l k k-1 k 

E[n( J] N,-nYl ^^)] = E - E = 0' (56) 

i=l i=l i=l i=l 

and Yfc and A^i — 7i -^j) jointly Gaussian, which implies that they are independent. Further- 

more because {J2i=i -^i ~ "fiYli=i^i) independent of X, the Markov string (Yi, 12, • • • ^A^) ^ X ^ 
{Wi,W2, ■ ■ ■ , Wn) implies that it is also independent of (VFi, W2, . . . , Wn)- It follows 

(k-l k \ 

7fcn + (1 - lk)X + E - E ^i\^kW^ (57) 

i=l 1=1 / 

(k-l k \ 

{l-jk)X + ^Ni--fk^Ni\YkWn (58) 

1=1 i=l J 

/ k-l k \ 

= hi{l-jk){X-EiX\YkW^)) + ^Ni-^kY.^i\'^kWn (59) 

\ i=l 1=1 / 

(k-l k \ 

(1 - jk){X - E(X|nW^f )) + Y^Ni-^kY,^i] ■ (60) 
i=l i=l / 

By the aforementioned independence relation, the variance of term in the bracket is bounded above by 

k-l 

Dk = (1 - n?Dk + (1 - jk? E + (61) 

i=l 

Define the following quantities 

K, ^ h{Xm = llog—^^^^^, (62) 

1 27r ^ 

i^fc = = -log-J^, fc = 2,3,...,iV (63) 



Summarizing the bounds in (1511) and (1501) . we have 



RHBiDi,D2,...DN) > - log 14^=^ / =i?;,g(/jv), (64) 



where for convenience we define Di = Di. 

To show that max^^c{Di, D2, .■■,£>«} ^/^^(^i') indeed achievable, construct the random variables 
{W^,W^, W^) as follows. Assume that Dj, < E[X -K{X\Yk)]'^ for each A; = 1, 2, . . . , iV, because otherwise 
this distortion requirement can be ignored completely. 
[Construction of {W^, W^,...,W;j)] 

1) For each A; = 1, 2, . . . , A^, determine the variance (t|^ of a Gaussian random variable such that Dk = 
K[X-K{X\Yk,X + Zk)]\ 

2) Rank the variance of a'^^ in an increasing order, and let uj{k) denote the rank of a^^- 
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3) Calculate a%, = a\ , and a\, = a\ — o\ for k = 2,3, . . . , N. 

4) Construct a set of independent zero-mean Gaussian random variables Zg, . . . , Z^) to have variance 




5) Construct a set of random variables iyV^, W^-, ■ ■ ■ , W^) as 

Wt=X + Y,Zl (65) 

i=l 

Next we show that this construction of {W*, W2*, . . . , W^j) achieves one of aforementioned lower bounds and 
thus is an optimal forward test channel. Choose the set A*j^ = {k : uj{k) < Lo{j) for all j > k}, and denote the 
rank (in increasing order) of its element k as r{k). Clearly by the construction we have 

N 

I{X; W^fc+i, W^fc%2, ■■■,W*^) 

k=l 

+ . . . + ^(x|y,-.(i)VF;_,(2) - 

- ... - [/i(y,-i(i)|y,-i(2)VF;-.(2)) - /i(i;.-i(i)|xy,-i(2))] 

because of the construction of {Wi , ■ ■ ■ ,W^) and the fact that they are jointly Gaussian with 
{X, Yi,Y2, . . . , Yat). Thus, we have proved the following theorem. 

Theorem 7: The auxiliary random variable {W^, W2, • • • , W^) constructed above achieves the minimum in 
the Heegard and Berger rate distortion function for the jointly Gaussian source and side informations. 

It is clear that we can determine the set A*j^ before constructing (VFf , W2, • • • , W^) using the aforementioned 
procedure, which can simplify the construction. However, the current construction has the advantage that each 
is almost individually determined by Dj., and does not substantially depend on the other distortion constraints. 
This will prove to be useful for the general scalable coding problem. It is worth noting that it seemingly requires 
comparing 2^ — 1 values of R*^^{A£)) to determine Rhb{Di, D2, ■ ■ ■ , D2), however, from the forward calculation 
we see that in fact 0{N) complexity suffices. 

This result can be interpreted using Fig. [8] On the horizontal axis, the N marks stand for the N random 
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variable l^(^-i(2)) ■ ■ ■ '^cj-i(7V))' vertical axis, the N marks stand for the N levels of 

side informations {Yi,Y2, . . . ,Yn)- The random variable pairs (PVfci^fc) are then the points of interest on the 
plane, since if the k-th decoder has (Y^, W^) the desired distortion can be achieved; the {Wk,Yk) pairs are in 
one-to-one correspondence to the {uj{k),k) pairs. Next we associate the unit square below and to the right of 
each integer point is associated with a rate of value 

Rij = IiW^-.^iy,Yj_^\YjW^-r^,+^)) (66) 

where we define ^^^1(^+1) = 0, and Yq = X. For each k = 1,2, . . . , N, if we cover the rectangle below and 
to the right of {Lo{k), k), then the sum rate associated with the covered area is exactly Rhb{Di, D2, ■ ■ ■ , D^). 

With Fig. m the coding scheme can be understood as follows. The coding proceeds from Y^ to Yi, i.e., 
from high to low on the vertical axis; the fc-th step (A;-th decoder) specifies an integer point {uj{k),k), which 
corresponds to a {Wk,Yk) pair, on the figure, and additional rate is required if the area below and to the right 
of this point induces new area to cover. This order is illustrated in Fig. [8] along the arrows. Note that 

fc k 

k 

= /(T^^-i(i);X|VK.-i(,+i))-/(T^^-i(i);mW^-i(i+i))] (69) 
= (70) 
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and it is the rate for a vertical slice of hight k between horizontal position i and i + I, which is in a quite 
similar form as ( [66l ). In this example figure, the decoders with side information Ijv-a and do not require 
additional rates. More generally, if {uj{k),k) is inside the area already covered by the previous coding steps 
[N, N — 1, . . . ,k + 1), then this stage does not require additional rates. In fact, the corners of the final covered 
area specifies the set A*jj. 

The following observations are essential for the general Gaussian scalable coding problem: each unit square 
in Fig. [8] is not merely associated with rate i2j j , it is in fact associated with a fraction of code Cjj with the 
following properties 

1) The rate of C^j is (asymptotically) Ri,j; 

2) If the fractions of code associated with the area below and to the right of (a;(A;), k) are available, then the 
decoder with side information can decode within distortion D^; 

3) The same set of code d^j can be used to fulfill only subset of the constraints, the rate calculated by the 
covering area method is the quadratic Gaussian Heegard and Berger rate distortion function. 

The first and second observations are straightforward by constructing the nested binning together with conditional 
codebooks as described in Section |llll i.e., N — 1 conditioning stage from W^^i^^^ to W^.i^^y^ and each 
conditioned codebook has N nested levels from coarse for Yi to fine for Ijv- In fact, it is not necessary to 
use N nested level for each codebook, but we do so for simplicity of understanding. The last property is due to 
the inherent Markov string among W*, W2, ■ ■ ■ , and X. 

B. Scalable coding with joint Gaussian side informations 

Now consider the scalable coding problem where side informations and distortions are given by a permutation 
7r(-) of that in the last subsection, i.e., Y- = Y^^^^-^ and D'- = D^^^^y We next show that the identically permuted 
set of random variable (VFf , W2, ■ ■ ■ , W^) achieves the Heegard-Berger rate distortion function for any first k 
stages, thus optimal. In light of pictorial interpretation in Fig. [8l this reduces to rearranging the coded stream of 
Cij. Fig. |9] shows the effect of changing the scalable coding order. 

More precisely, for a certain side information Y- = define the following sets: 

C{k) = {ir{i) -.i < k,ir{i) > TT{k)} (71) 
E.{k) = {7r(i) : i < A;,7r(i) < 7r(A;),cj(7r(i)) > cj(7r(fc))}, (72) 

and the following function 

E{k) = max [{7r(f) : i < k, 7r(f) < 7r{k),Lo{-K{i)) < uj{7r{k))} U {0}] , (73) 

and let Yq = X. Let the set of integers E^{k) be ordered increasingly, and the rank of its element j be r(j). 
Denote the set of random variables {Wj : j G C} as for an integer set C. The following k-Xh stage rate is 
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Fig. 9. An illustration of incremental rate for scalable coding. The denser shaded region gives the incremental rate Rk for the stage 
with side information Yk- 

achievable for A; = 1, 2, . . . , A'' 

+HYEiky,w:^k)\yAk)WLik)W^(k))- 

It is clearly this rate corresponds to exactly the dense shaded region in Fig. |9l which is the sum of rates of fraction 
of codes C{i,j) as described above. The property of this fraction code C{i,j) thus implies the following. 
Theorem 8: The Gaussian scalable coding achievable rate region for distortion vector 
, Z)7r{2) 1 • • • 1 is the rate vectors (iii, i?2, • • • , Rn) satisfies 

k 

Y,Ri>RHB{D^(i),D^^2),---,D^(k)), k=l,2,...,N (74) 

1=1 

where the side informations are (1^7r(i), ^7r(2)i • • • ) ^7r(A,))- Furthermore, it is achievable by a jointly Gaussian 
codebook with nested binning. 

An immediate consequence of this result is the following corollary. 

Corollary 5: A distortion vector {D^(^i-^, D^(^2)^ ■ ■ ■ i Dtt{N)) is perfectly scalable along side informations 
('5^7r(i),^7r(2), • • • ,^7r(fc)) ^r the jointly Gaussian source if and only if Rhb{D^(^i), D^(^2), ■ ■ ■ , D^{k)) = 
^x|y,<„ (D.ik)) for each k = l,2,...,N. 

This corollary applies to one of the important special cases where Di = D2 = ■ ■ ■ = and 7r(A;) = N — k + 1 
for each k, i.e., when all the decoders have the same distortion requirement, and the scalable order is along a 
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decreasing order of side information quality. This implies that at least for the Gaussian case, an opportunistic 
coding strategy does exist when the distortion requirement is the same for all the users. 

VII. Conclusion 

We studied the problem of scalable source coding with reversely degraded side-information and gave two inner 
bounds as well as two outer bounds. These bounds are tight for special cases such as one lossless decoder and 
under certain deterministic distortion measures. Furthermore we provided a complete solution to the Gaussian 
source with quadratic distortion measure with any number of jointly Gaussian side informations. The problem of 
perfect scalability is investigated and the gap between the inner and outer bounds are shown to be bounded. For 
the doubly symmetric binary source with Hamming distortion, we provided partial results of the rate-distortion 
region. The result illustrates the difference between the lossless and the lossy source coding: though a universal 
approach exists with uncertain side informations at the decoder for the lossless case, such uncertainty generally 
causes loss of performance in the lossy case. 

Appendix I 

Notation and Basic Properties of Typical Sequences 

We will follow the definition of typicality in [11], but use a slightly different notation to make the small 
positive quantity 6 explicit (see [5]). 

Definition 4: A sequence x £ is said to be (^-strongly-typical with respect to a distribution Px {x) on X 

if 

1) For all X with Px{a) > 

1 



n 



-N{a\x) - Px{a) 



< S, (75) 



2) For all a e ^ with Px{a) = 0, N{a\x)^0, 
where A^(o|a;) is the number of occurrences of the symbol a in the sequence x. The set of sequences x G X"' that 
is (^-strongly-typical is called the (^-strongly-typical set and denoted as T^x]' where the dimension n is dropped. 

The following properties are well-known and will be used in the proof: 

1) Given a a; € ^[x]' ^'^^ ^ V whose component is drawn i.i.d according to Py and any S' > 5, we have 

where Ai is a small positive quantity Ai ^ as n — cxo and both 6, 5' 0. 

2) Similarly, given {x,y) G T^xY]' ^'^^ ^" ^ ^' ' component of z be drawn i.i.d according to the 
conditional marginal Pz^iyXVi)^ ^^en 

2-n(/(X;Z|y)+A.) < y ^) ^ < ^-niIiX;Z\Y)-X.) ^jj^ 

where A2 is a small positive quantity A2 — > as n ^ 00 and both 6', 6" —> 0. 
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3) Markov Lemma [16]: If X <-> F Z is a Markov string, and X and Y are such that their component is 
drawn independently according to Pxy- Then for all 5 > 

^lim P[(X,z) e Tj^^'j \{Y,z) € Tf^^,] ^ 1. (78) 

furthermore, 

l\m^P[{X,Y,z) G Tf^^^j |(F,z) G Tf^^,] ^ 1. (79) 

Appendix II 
Proof of Theorem [T] 

Codebook generation: Let a probabiUty distribution PwiW2XYiY2 = PxvWiW2Pyi\xPy2\Yi' ^rid two 
reconstruction functions fi{Yi, Wi) and f2{Y2, W2) be given. First construct 2"^^ coarser bins and 2"^^+^^ finer 
bins, where Ra and i?^ are to be specified later. Generate 2^^ length-n codewords according to Pv{-), denote this 
set of codewords as C^,; assign each of them into one of the finer bins independently. For each codeword v ^ Cy, 
generate 2"^"i length-?! codewords according to = Ylk=i Pwilvi^i^klvk)' denote this set of 

codewords as Cwi{v)', independently assign each codeword to one of the 2"^« bins. A gain for each V codeword, 
independently generate 2"^**'2 length-?! codewords according to P]^^\y{w2\v) = 0^=1 ^W2|v(^2,fc|^fc)> denote 
this set of codewords as Cvi/2(^'); independently assign each codeword to one of the 2"^^ bins. Reveal this 
codebook to the encoders and decoders. 

Encoding: For a given x, find in Cy a codeword v* such that {x, v*) G T[xv]'' calculate the coarser bin index 
i{v*), and the finer bin index within the coarser bin j{v*). Then in the Cw^{v*) codebook, find a codeword 
wl such that {wl,v* ,x*) G T^^^y-^^, and calculate its corresponding bin index k. In Cw2{v*) codebook, find 
a codeword W2 such that {w2,v*,x) G T[]^.^vx]' ^^'^ calculate its corresponding bin index /. The first-stage 
encoder sends i and k, and the second-stage encoder sends j and /. In the above procedure, if there is more than 
one joint-typical sequence, choose the least; if there is none, choose a default codeword and declare an error. 

3| 1 5 

Decoding: The first stage decoder finds v in the coarser bin i, such that {v,y-i_) G ^[t/y'] ; then in the Cw^{v) 
codebook, find such that {wi,v,yi) G Tj^jfy j. In the second stage, the decoder finds v in the finer bin 
specified by (?,j) such that {v^y^) € Tj'^y||'; then in the Cw^{v) codebook, find W2 such that {w2,v,y2) G 
Tj^yy^j. In the above procedure, if there is none or there are more than one, an error is declared and the decoding 
stops. The first decoder reconstructs as xi^k = and the second decoder as X2,k = f2{w2,k,y2,k)- 

Probability of error: First define the encoding errors: 

El = i?Sn{v^GC,,(x,^) ^rfi^]} 

E2 = E'o n E'^ n {Vtoi G C.^,{v*), {wi,v*,X) i Tl^^yx]} 
E^ = E'o n Et n {\fw2 G Cy,,{v*), {W2, v\X)i Tf^^yx^}- 
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Next define the decoding errors: 

Ee = E'onEln{3v' ^v* ■.i{v')=i{v*) and{v',Yr) eT^^yl^^} 

Er = E'o n E'l n {3^' / v* : i{v') = i{v*) and j{v') = j{v*) and {v' , Y^) G T^^^^}^^} 

Es = e'q n El n El n E^ n E'^ n {(^^i*, v*, x, n) ^ T^f^.yxn]} 
= E'or)EtnEinElnE!^n{iw*,v*,x,Y2)^T^^^y^y^^} 

E,o = E'o n El n i?2^ n i?4^ n E^ n {3^;^ / w* l{w[) = l{w*) and {w{,v*, H) G Tj^^fl^V,]} 
En = E'onEtnEi^nElnE',n{3w!,j^w*:liw!,) = l{w*)andiw'^,v*,Y2)eT^^^ 

Apparently, for any e', for n > ni{e',6), P{Eq) < e'. We have also 

P{E^) < P{XeT')P{{iveC,„{X,v)^T^'}\XGT'' 



< exp(-2-"(^(^'^)+^-«-)), (80) 

where Property 1) of the typical sequences and (1 — x)^ < e~^^ are used. Thus P{Ei) — > 0, provided that 
Rv>I{X-V) + \. 

P{Eii) and P{E^) both tends to zero due to the Markov lemma; it requires the condition {v*,X) G Pfvx] 
to hold, which is indeed so given Ei does not happen. Similarly, both P{Es) and P{Eg) tends to zero for the 
same reason. Notice that if [v* , X ,Yi) G P^xYi]' 

then iv*,Yt) G T^j^^^, thus V* can be correctly decoded if 
there is no other codewords in the same bin satisfying the typicality test. 
Conditioned on E^, we have {X,v) G T^xvy Thus 



< exp(-2-"(^(^'^il^)+^^-«^)) (81) 



where property 2) of the typical sequences is used. Thus P{E2) tends to zero provided Rw^ > I{X] Wi\V) + Xi. 
Similarly P{E'^) tends to zero provided Rw^ > /(X; ^^2!^) + A2. 

Conditioned on E^, yi G T[Yi]' ^i^^ce codeword in Cy are generated independently according to Pu{-) 

P{Eq) < ^ 2~"^'^2~"'^'^^^''^^~'^'^ 

^ 2n{Rv-RA-I{Yi;V)+\i) (-§2) 



where we have used property 2) of the typical sequences and the fact the bin to which v is assigned is independent. 
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Thus P{E(i) provided that Ra > Ry - lO^i] V) + A3. Similarly PiEj) provided that Ra + R'^ > 

Rv - I{Y2;V) + X^. 

Conditioned on E^, {v*,Yi) € T^^^^y Thus 

P{Eiq) < 2""^"'i 2~""^'^2~"'-''^*-^^''^^'^-'~''*''-' 

^ 2n{Rw^-RB-I{Y^■,W^\V)+X3) ^^y^ 

where property 3) of the typical sequences is used. Thus P{Eiq) tends to zero provided Rb > Rwi — 
I{Yi;Wi\V) + A5. Similarly, P(^ii) tends to zero provided Rc > Rw2 - I{Y2;W2\V) + Xq. Thus the rates 
only need to satisfy 

Ri = Ra + Rb> I{X;VWi\Yi) + X' (84) 
Ri + R2 = Ra + R'a + Rb + Rc > VW2\Y2) + W2\VYi) + A" (85) 

where A' and A" are both small positive quantities and vanish as 5 ^ and n — s- 00; then Pf. < X^Jlg ^(^«) ~^ 
0. It only remains to show that the distortions constraints are satisfied as well. When no error occurs, then 
{Wi,X, Yi) G Tj'll^'^yj and {W2,X, Yi) G Tj^j^^j. By standard argument using the definition of the typical 
sequences, it can be shown that 

d{x, x\) < Ed[X, fi{Wi,Yi)] + e (86) 

where e' = max(d(2;, f ))(3|V x Wi x X x yi\6 + Pg)- Thus the distortion can be made arbitrarily small by 
choosing sufficiently small 5 and sufficiently large n. Similar arguments holds for the second stage decoder. This 
completes the proof. ■ 



Appendix III 
Proof of the Theorem |2] 



Assume the existence of (n, Mi, M2, D2) RT) Sl-scalable code, there exist encoding and decoding functions 
and Tpi for 1 = 1,2. Denote as T^. will be used to denote the vector {Xi,X2, ■ ■ ■ , Xk-i) and X^J" 

to denote {X^+i, Xk+2, ■ ■ ■ ,Xn); the subscript k will be dropped when it is clear from the context. The proof 
follows the same line as the converse proof in [7]. The following chain of inequalities is standard (see page 440 



of [22]). Here we omit the small positive quantity e for simplicity. 

n 

nRi > H{Ti)>H{Ti\Yr) = I{X;Ti\Yi) = J2nXk;Ti\YiX^) 

k=l 

n 

= Y^^i^klYiX^) - H{Xk\TiYrX-) 

k=l 
n 

= Y,H{Xk\Yi,k) - H{Xk\TiYiX^) 

k=l 

n 

> Y,I{Xk;TiY^-Y+\Yk). 

k=l 

Next we bound the sum rate as follows 

n(i?i + i?2) > H{TiT2)>H{TiT2\Y2) = I{X;nT2\Y2) 
= I{X;TiT2Y^\Y2)-I{X;Y^\nT2Y2) 

n 

= Y.[I{Xk;TiT2Yr\Y2X~) - I{X;Yi^k\TiT2Y2Y-)\. 

k=l 

Since (Xfc,l2,fc) is independent of {X^ ,Y^ ,Y^), we have 

I{Xk;T,T2Yi\Y2X-) = I{Xk;nT2YiY2-Y+ X-\Y2,k) > I{Xk;TiT2Y^Y^-Y+\Y2,k) 
The Markov condition Yi^k ^ (^fc,^2,fc) ^ {X' X+TiT2Yj~Y2~Y2^) gives 

I{X;Y^^k\T,T2Y2Yf) = I{Xk;Yi^k\TiT2Y2Yf). 

Thus we have 

n 

n{Ri + R2) > Y.[I{Xk;T^T2YiY-Y+\Y2,k) - I{Xk; Y^^k\TiT2Y2Y-)] 

k=l 
n 

= Y,[IiXk;TiT2Y^-Y2~Y+\Y2,k) + IiXk;Y+\TiT2Y2Y^-Yi,k)]. 

k=l 

The degradedness gives Y2^k ^ ^ {Xk,TiT2,Y^Y^Y^), which implies 

n 

n{Ri + R2) > Y.^I{Xk;TiT2YfY+Y^-\Y2,k) + /(X,.; Y+\TiT2YfY+Y^-Yi,k)]. 

k=l 

Define Wi^k = (TiYfY^) and W2,k = {TiT2Y^~Y^Yf), by which we have 

n 

nRi > ^I{Xk;Wi,k\Yi^k) 

k=l 

n 

n{Ri + R2) > ^[IiXk;W2,k\y2,k) + I{Xk; Wi,k\W2,kYi,k)]. 
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Therefore the Markov condition (Wi fc,W2,A:) ^ ^ Yi^fc ^ ^2,fc is true. Next introduce the time sharing 
random variable Q, which is independent of the multisource, and uniformly distributed over /„. Define Wj = 
(Wj- Q,Q), j = 1,2. The existence of function fj follows by defining 

fi{Wi,Yi) = V'i,q(0i(^),^i) (94) 
f2{W2,Y2) = i;2,Q{MX),MX),Y2) (95) 

which leads the fulfillment of the distortion constraints. It only remains to show both the bound can be written 
in single letter form in Wi,W2, which is straightforward following the approach in (page 435 of) [22]. This 
completes the proof for Tlout{Di, D2) 2 7l{Di, D2). ■ 
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